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STUDY OF THE PRESENT PROBLEMS OF THE SCIENTIFIC 
INFORMATION PROCESSING AND TRANSMISSION 


Paul STERIAN!, Dan Alexandru IORDACHE”, Viorica IORDACHE? 


Rezumat. Pentru a evalua calitatea informatiei, aceasta lucrare tsi propune sa indice 


informatiei (de exemplu clopotele de rezonan{a etc.). Este insistent subliniata necesitatea 
folosirii: procedurilor matematice riguroase, atat ale Analizei numerice cat si ale 
Statisticii matematice. 


Abstract. In order to evaluate the quality of the information, this work aims to point out 
some criteria for the study of the compatibility of: a) information with the true physical 
parameters, b) different correlation (and of their corresponding theoretical models) with 
the experimental data. The possibilities to evaluate the error risk to the rejection of the 
compatibility with the experimental data, as well as the basic features of the information 
transmission (i.e. resonance bells, etc) were also studied. The necessity to use both the 
rigorous mathematical procedures of the Numerical Analysis and of the Mathematical 
Statistics is strongly underlined. 


Keywords: Mathematical Analysis, Data Processing, Numerical Physics, Mathematical Statistics, 
Numerical Methods 


1. Introduction 


The field of Sciences presents an extremely fast development. According to the 
published data of the Institute for Scientific Information (ISI, Philadelphia — US), 
only the number of Physics papers published and indexed in the interval 1981- 
1996 was of about 77350 ISI indexed works/year, being frequently necessary the 
use of indices formed by 4 figures and a letter (e.g. the symbol 4281W 
corresponds to the field of optical fiber sensors; fiber gyros). 


For this reason: a) there intervene sometimes certain errors (as those in the cases 
of: (i) “anomalons” [1], or of the: (11) assumed nuclear fusion at low temperatures 
(starting from some palladium compounds), b) some important works (from the 
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fields of the Nuclear Physics or of Superconductors) have sometimes 30...50 
authors, in order to avoid the possible errors! High quality solutions of these 
difficulties are given by the Numerical Analysis, which ensures the scientific 
study of all theoretical models. The main stages of the modern processing of the 
experimental results are presented in the frame of Diagram 1, their achievement 
being accomplished by means of the methods of Statistical Mathematics and of 
the Numerical Analysis, together with its correspondent in Physics — the 
Numerical Physics [2]. 


2. Present possibilities and limits of the scientific information processing 


The outstanding development of the computation techniques allowed not only the 
description of the physical systems in different conditions, the analysis of its 
compatibility relative to experimental data, and also the simulation of some 
physical and technical processes in special conditions (difficult to be reached in 
laboratories). Because this method is considerably cheaper and it allows the 
simulation of certain phenomena produced in inaccessible conditions, it presents a 
considerable interest both from the technical point of view and from the didactic 
one [3]. Taking into account the considerable future increase of the computation 
abilities due to the use of parallel computers, there appeared some special 
simulation techniques, as the Local Interaction Simulation Approach (LISA) [4], 
mainly intended to such techniques. The main difficulty met by the simulation 
techniques refers to the appearance of some numerical phenomena: instabilities, 
divergence [5], dispersion, distortion [6], etc, which lead sometimes to 
considerable errors of the obtained numerical simulations. 


Field of the "absolute" (Mathematics) 


For complex 
simulations 


Tests of 
incompatibility Concepts 


The rough Interpretations 


(recent examples: 


(unproces- by intuition ; nuclear fusion at 
sed) | (—> statistical |R specie y z low temperatures 
information hypotheses) tested MISINFORMATION! 


notions anomalons, 118th element) 


Diagram 1. Basic Stages of the present Scientific Information Processing. 
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The most misleading numerical phenomenon is that of pseudo-convergence [5], 
because it leads to some apparently correct simulations (their shape being 
qualitatively correct, usually), but erroneous from the quantitative point of view. 


For this reason, the present work considers that the numerical simulations have to 
be studied in strong connection with the basic results of the experimental data 
processing. 


3. Evaluation of the Error Risk at the Rejection of the Compatibility of a 
Theoretical Model with the existing Experimental Data 


Given being the huge volume of the experimental results, at beginning it is wanted 
their synthesis by means of some semi-empirical relations, in order to associate 
them finally to (or to obtain) some corresponding Jaws. Because any physical 
relations or laws represent only some approximations of the empirical truth, all 
these relations will be found as incompatible for higher accuracy of considered 
measurements, the basic decision in the statistical studies of the experimental 
results being so the rejection of the compatibility of some relations (or theoretical 
models) relative to the experimental results. As for any statistical hypothesis, it is 
associated also to the hypothesis of compatibility rejection a certain error risk, 
which has to be known always, but which is ... rarely studied! 


As it is well-known [7], to any set of experimental results referring to N different 
parameters corresponding to the same state of the studied system (let be 
Ximp +» XNmp — the most probable values of these parameters) it is associated a 
confidence domain, which — in the frequent case of a normal distribution — has the 
shape of a N-dimensional ellipsoid: 


eT 1s = fy(N;). (1) 


where ¢€ is the errors "column" vector (=X; -Ximp), e! is its transposed (“row” 
vector, FY is the matrix of co-variances (each its element being equal to the 
statistical average of the product of corresponding errors: Ij; = < ¢; ¢;>), and fy(Li) 
is a certain function on the confidence level L; corresponding to the considered 
confidence domain. In the frequent case of the study of a pair of physical 
parameters X and Y, the confidence domain corresponding to the normal (2- 
dimensional) distribution will correspond to the internal part of the ellipse: 


2 2 
Xk ~Xk,emp Yk — Yk,emp Xk ~Xk,emp || Yk — Yk,cmp 
i 2r =fo(Le), (2) 
(xg) ] SVK) | s(x) | SVE) ae 
where s(x;), 5(y,) are the square mean deviations corresponding to the values of 


the parameters X and Y for the state k, r;, is the correlation coefficient of the values 
of these parameters for the indicated state: 
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_ Dx, ’ Ye) = s (XK —Xkcmp Vk ~ Yk,cmp) 2 
S(XK)- (YK) S(XK) (YK) 

Usually, the correlation coefficient r, is considered as the main criterion for the 
appreciation of the compatibility of some relations y = f(x) relative to certain sets 
of experimental data. In fact, this coefficient “evaluates" only the nearness degree 
of the confidence domains centers relative to the studied regression line (curve); 
e. g. despite that |r,| > |r|, the ensemble of experimental values from Fig. 1.a is 
not compatible with relation y = f(x), while the set from Fig. 1.b is compatible 
with this relation, because the corresponding confidence domains are crossed by 
the regression line (function) y = f(x). Obviously, the solution of such problems, 
of high importance for the experimental data processing can be achieved only by 
means of computers. Particularly, the too small values (e.g., less than 0.01) of 
gx = 1—Nix [obtained from relations (2) and (3b) for x, = xn, Ye = Vio Where Xi, Vix 
are the coordinates of the tangency point of the confidence ellipse to the 
regression line (function) y = f(x) (see the broken ellipse from Fig. 1.b)] can 
justify the hypothesis of incompatibility of the studied y = f(x) relation relative to 
the considered ensembles of experimental data [7]. As the error risk g, =1—Lj at 
the rejection of the compatibility of experimental results for the state k relative to 
the studied theoretical relation y = f(x) is less, or it is larger than a certain 
threshold (chosen usually between 0.001 and 0.2), the studied compatibility is 
rejected, or it is accepted, respectively. 


sand: fy(Wq) =-2{t=n2)-In(t 24.) 3) 


y=f{x) y=f(x) 


x x 
Fig. 1.a. Incompatible theoretical relation Fig. 1.b. Theoretical relation compatible with 
relative to the considered confidence domains. the considered confidence domains. 


The accomplished studies (see e.g. [2a], p. 44) pointed out the possibility to 
appreciate the global (for all existing experimental data and confidence domains) 
compatibility of a theoretical relation versus these experimental data, by means of 
the global compatibility criterion A (A > 1 means compatibility), defined as: 


jan’ (4) 


(-\rw/)° ) 
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where V,(ry) is the variance of the correlation coefficient corresponding to all N 
considered confidence domains centres (A << 1 corresponds to incompatibility). 


4. Description Possibilities of the Apparent and True Information Amounts 
obtained by Measurements and Transmitted by Publications, respectively 


As it is well-known, the main target of the usual determinations is represented by 
the physical parameters, whose individual. values are statistically distributed, due 
to the presence of fluctuations. 

The most frequently met distribution of the individual values (of fluctuation) of a 
parameter X is the one-dimensional (Gauss) normal distribution, described by the 
probability density: 


es 
px) = os] SS] (5) 
2 


4.1. Information amounts corresponding to Shannon’s definition 


Starting from the Shannon [8] — Hincin [9] expression of the uncertainty degree 
associated to a discrete statistical set of physical events: 


n 
H(P; |i=1n)=-a> P; logy P, (6) 
i=l 
where: a > 0 and b > I, one finds the uncertainty degree associated to a continuous 
statistical ensemble, described by the probability density p(x) using the relation: 


H(p(x)) =—a | p(x) log, (p(x)Ax)- dx, (7) 


where Ax is the suitably chosen “quantum” of the variable x. 
From relations (5) and (7), one obtains (see also [10]) the expression of the 
uncertainty degree for the one-dimensional normal (Gauss) distribution: 


H(PGauss@))= = E (22) c (8) 


Inb| 2 Ax 


Starting from certain measurements, it is found the confidence interval associated 
to the true value: 
D,, = PoGerpinkendadio opOcx,, |, (9) 


but the accurate localization of the true value (“mathematical hope”) inside the 
confidence interval is not possible. That is why the uncertainty degree for this 
interval will be calculated assuming a uniform repartition inside this interval: 


P(axy )=0 pentruay €D,, play )=C(const.) pentru ay €D,,. (10) 


Using the normalization condition: 
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foe) Xnt+zp:o(%Xp) 
l= | p(ay)-day = |(C-daxy =2C-¢7alk,,), 
—o x, —zp:O(Xp) 


one obtains the expression of the uncertainty degree corresponding to the uniform 
statistical distribution associated to the confidence interval of the true value: 


Ax 


One finds that both the. uncertainty degree associated to the normal one- 
dimensional distribution, at that associated to the uniform distribution inside the 
confidence interval of the true value, involve the logarithm of the square mean 
value corresponding to the individual, and to the true value, respectively. 


Cc 2 ~ 
H(plax)) =-a | plax log, (play )Ax)-day = ology | 2220? (11) 


—0o0 


4.2. The apparent Information amount 


The apparent information amount obtained by means of the n-th determination of 
a certain set of measurements is defined as difference of the residual uncertainty 
degrees after the (n — 1)-th and after the n-th determination, resp. 


Taking into account the expressions (7) and (10) of the uncertainty degree for the 
individual, and the true values of the studied parameters, it results that the 
apparent information obtained by means of the n-th determination can be 
expressed starting from the mean square deviations for the sets of results obtained 
after the (n —1)-th and after the n-th determination, respectively, as: 


OF; 
Lappin — Hnia-An =a-log, a 7 (12) 


n 


4.3. Modeling of the true Information amount 


In order to understand the difference between the apparent and true information 
amounts, we will consider the example of a series of determinations of a 
(physical, particularly) parameter, the first n (22) individual values being (by 
chance, excessively near, while the (n+1)-th is compatible with the previous ones, 
but rather far from them. Due to the increase of the square mean deviation of the 
individual and average values after the (n+1)-th determination, the apparent 
information amount will be [see also relation (12)] negative, while in fact this 
(n+1) - th determination brings an important amount of true information. 


A possible definition of the true information amount starts from the finding that 
the description of complex systems (of the physical parameters, particularly) is 
strongly connected to certain probability distributions. As it concerns the 
experimental findings (or communications) about the values of the studied 
parameter of the considered complex object, they correspond to a certain 1D 
distribution, of course somehow different than the true one. As it results from the 
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examination of Figure 3, there is indeed an overlapping zone of these 
distributions, whose magnitude increases for more accurate descriptions of the 
studied parameter (object). In conditions when both these one-dimensional 
distributions are normalized to 1, it is possible to define the true (physical) 
information amount by means of the expression (see also [11]): 
3 =2-Overlapping Area —1 . (13) 
Measure of the true information 


Distributions of the 
individual values for: 
- the experimental results, or 
the transmitted information 


2xCommon area - 1 


if the area under 

each distribution 

isnormalizedtol fF / 
Wy 

True information < 0 | Vy 


= misinformation’ / Wy 


\- the true parameter 


/ (object) 
» Be - 


Fig. 2. Definition of the true information amount. 


Because the obtainment of an additional true information amount assumes the 
reduction of the mean square deviation of the average value s(<x(C)>), one finds 
that the achievement of this goal is considerably more difficult when the existing 
knowledge correspond already to a reduced square mean error s(<x(C)>). 
Particularly, this is the case of the “consolidated” disciplines (mathematics, 
technical sciences, theoretical physics, etc). This could be a cause of a more 
reduced of the rationalized individual impact factor, defined as a ratio of the total 
number of citations/number of specialists from the studied (sub)domain. 


Unlike the above situation, the works elaborated in the scientific fields with a 
reduced consolidation degree could present rather high values of the true 
information amounts and of the rationalized impact factor, but could present also 
rather frequently negative values (hence misinformation) of the true information 
amounts, especially in the frame of domains of high complexity [11] (examples: 
the works referring to the: a) M(ancy) rays [12] (53 favorable papers published in 
the first part of 1904 by the Sciences Academy of France and the Leconte price 
awarded in December 1904 to their “discoverer” (March 1903) — the 
correspondent fellow of the French Academy — prof. René Blondiot), b) the 
“giant” elementary particles (anomalons, see [1] and table 1), c) “nuclear fusion” 
in very simple conditions: (i) at cold, starting from the electrolysis of some 
palladium salts, or: (4i) “in bubbles” (dr. Rusi Taleyarkhan, Univ. Purdue, 
publications during the 2002-2006 [12], pp. 59-60), d) trans-uranium element 118 
(dr. Ninov, Lawrence Berkeley National Laboratory, [12], p. 67), etc.). We have 
to underline here also that the “boarder” between the negligent experimental data 
processing and the scientific frauds is rather “narrow” (e.g., dr. Ninov did not 
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recognized never to be used some frauds). Taking into account that (unlike the 
works on N rays) the investigations referring to the existence of anomalons were 
accomplished in several different countries and they were beyond any fraud 
suspicion, it results that this situation was due to the particular difficulty of the 
corresponding data processing. For this reason, we believe as interesting to 
present (in the frame of Table 1) the “oscillations” between the favorable and 
negative opinions of specialty researchers up to the final rejection of the 


hypothesis of these particles (anomalons) existence. 


L.E.M.Friedlander | PhysRevLett | 45 | 1084 _| 1980 | Faverable _| 
| Phys.Revbett | 48 | 02 | 1982 | Favorable _ 
[Fis Rev.Let [48 | 856 | 1962 | Favorable — 
4.T.M. Liss Phys.Rev. Lett | 49 | 775 | 1982 |Yes; indirectly 
5M. H MacGregor | Phy. Rev. Lat | 49 | 1815_| 1982 | Review 
6.P.L. Jain | PhysRev.c | 25 | 3216 | 1982 | Negative _| 
TEM Fuedlander 
BHA Gustafsson | Phys.Rev Lett | 51 | 363 | 1983 |Yes; indirectly| 
ML Toclaell | Plow Rev Let | #1 | eas | 1983 | Favorable | 
10. R. J. Clark 
11.3. D.S teverson Phys. Rev. Lett. | 52 | SIS | 1984 | Negative | 
12 T TM Synow | Phys Rev bat | #2 | 982 | 1988 | Negative _| 


13. A. Z.M. Ismall 


17. A.A, Katanyslev 

18, H. Drecheel 

19. D.Ghark 

20. Bharga 

2.MM.Agewwall | PhysRev.c | 32 | 6 | 1985 _| 
22. Derick 

23. H. Drechsel 

24. G. Baroni 

25,E PF Banik 

= 5 ais 


Table 1. The main experimental works referring to “anomalons” (according [1]) 


4.4. Description of the received information 


The accomplished study pointed out that presently: a) the best technical manner 
for the information transmission uses the optical communications [13], [14], as 
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well as: b) the most important features of the basic components: laser sources 
[15], optical fibers [16], optical amplifiers [17], use of solitons for signals 
multiplexing [18], etc of the modern optical communications systems. 


As it concerns the information transmission by publications, we will consider that 
it is a resonant process, somewhat similar to the forced oscillations. It is well- 
known that the dependence on the frequency @ of the external periodic excitation 
of the energy transmitted to an oscillator with the characteristic (eigen) frequency 
aw is described by the expression (see also Fig. 3,b): 


W@) 
Wto) = 74 (14) 


‘ 
u 

Fig. 3,a. Resonance bell of a forced oscillator. Fig. 3,b. Similar bell at information transmission. 
For frequencies to whom the transmitted energy reduces to half of the maximum 
value W(a@), we have: 

1 ps We +1 B 

— 1+ ea |S -@ —_— O_ = . 

B| * , hence: O+ : (15) 
it results that the physical meaning of the parameter B is that of "half-width” of 
the resonance bell (the width of the pass band). We mention also that the width of 
the pass band and the merit factor Q for the resonance selectivity are related by 


Oo 
the relation: O= B : (16) 


Similarly, the plot of the transmitted information amount by means of publications 
in terms of the ensemble of the uniqueness parameters corresponding to the 
readers knowledge will present the shape of a "bell" with a certain width B(u) of 
the pass band (see fig. 3,b).’ 


'It results that — in order to ensure a high impact factor — it is necessary to achieve a sufficiently 
high accessibility of the published work, by means of a friendly (hence not too rigorous) 
presentation of the approached matters. 
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4.5. Scientometric evaluations 


Because the experimental making evident and description of some narrow 
distributions p(x) requires some special knowledge, it results that the "bell" 3(u)) 
corresponding to such information description will be also narrow. 


For this reason, the impact factors corresponding to some works from 
unconsolidated scientific domains (broad p(x) distributions and 3(u) “bells”) will 
be usually considerably larger than the corresponding values for the usual works 
belonging to some consolidated scientific fields. 


One finds so that the impact factors are inversely proportional to the consolidation 
(hence of involved mathematics content) degree of the considered scientific 
(sub)field. 


Because the science amount involved by a certain domain increases with its 
quantitative (mathematics) content [the trend of all scientific fields being to 
involve more quantitative (mathematics) elements, as it can be found also from 
the cover page of Mathematical Reviews, including the majority of domains of 
Physics, Chemistry, Biology, Economical and Social sciences, etc], it results 
that - paradoxically - the citations number and the impact factor are (usually) 
larger for the scientific works with a reduced degree of quantitative 
(mathematics) elements. 


For these considerations, we will cite also the opinion of the mathematician 
Jean-Pierre Bourguignon, director of the «Institut des hautes études 
scientifiques » and president of the Ethics Committee of CNRS (Conseil National 
de Recherche Scientifique) of France: 


« Ce qui est le plus inquiétant, ad mon sens, c’est que les comités 
d’évaluation — au lieu de lire les travaux des chercheurs — s’en tiennent 
a une sorte d’analyse de leur impact. Pourtant, le fait qu’un article soit 
trés cité par d’autres chercheurs n’est pas forcément positif! Il peut étre 
au contraire pointé pour ses faiblesses. La massification nous fait 
dépendre de plus en plus de systémes trés automatisés; or moins les 
gens auront lu les articles, les auront critiqués, auront une opinion 
réelle, plus la situation deviendra fragile » [12], p. 69. 


We can find so that the impact factors can be used for scientometric 
classifications only for: 


a) scientific sub-domains with a comparable quantitative (mathematics) 
content, 


b) published works in scientific reviews of the same type [2c], resp. 
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Conclusions 
The accomplished study led to the following conclusions: 


(1) The quality of an information can be appreciated by means of some statistical 
criteria referring to its compatibility with the existing experimental data for: 


a) a given state, using the error risk criterion: g, =1-Ly,, 


b) a set of experimentally obtained confidence domains, by means of the 
global compatibility criterion: 


_ Vn wn) 
(I-|ry I) 


c) the entire distribution of the individual values of a considered 
parameter, using the true information amount [see relation (13)]: 


A 


3 =2-Overlapping Area —1 


(2) The use of computer simulations is often useful, but it has to be tested in order 
to point out and predict the features of the possibly existing numerical 
phenomena. 


(3) The most efficient present technical procedure for the information 
transmission is that of optical communications, involving the optical fibers, 
amplifiers, lasers (as sources), solitons (for signals multiplexing), etc. 


(4) The efficiency of the information transmission by means of publications 
depends strongly on the “resonance bell” of the readers’ knowledge, hence the 
impact factors can be used for scientometric classifications only for: 


a) scientific sub-domains with a comparable quantitative (mathematics) 
content, 


b) published works in scientific reviews of the same type, respectively. 


(5) Because the scientometry objectives are political essentially, we consider as 
useful to calculate the impact factors as a geometrical average of the cited works 
in the frame of: 


a) ISI reviews, proceedings or scientific monographs of the scientific 
works published abroad, 


b) ibid., for the scientific works published in the frame of some scientific 
reviews from Romania, 


c) some PhD dissertations. 
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